Which films are most costly to replace and why?

Of the 1000 films in the database, the average film costs about $19.98 to replace and the individual costs range from $9.99 to $29.99. To get an intuition of why films might be more expensive to replace, it can be helpful to explore the table (Figure 1) below where films are sorted by their replacement cost. It is difficult to learn much this way but as will be shown, there may not even be a definitive connection between a film’s characteristics and its replacement cost.

Note: Language ID and Release Year each only have a single unique value across all of the data.

Figure 1

The histogram of replacement costs (Figure 2) reveals the lack of a clear pattern among the replacement costs. To further explore what drives replacement cost, we can look at its relationship to other variables.

Figure 2

Figure 3 is evidence that there is not a linear relationship between the numerical characteristics of a film (rental duration, rental rate, and length), and its replacement cost. The variable with the correlation of the largest magnitude with replacement cost is rental rate at -0.0446. This is a very weak correlation and an additional reason why it would be warranted to explore the relationship between replacement cost and the categorical characteristics of films.

Figure 3

Figures 4 and 5 are box plots depicting the distribution of replacement cost relative to the unique values of the two categorical characteristics of films in our data: film rating and special features. Curiously, both the minimum and maximum replacement cost associated with each value for both of these characteristics are almost identical. Combining this with the similarities in interquartile range of replacement cost across the characteristics, it suggests that the categorical valuables are also not clearly connected to the replacement cost.

Note: For Figure 4, hover your cursor over individual box plots to better see the special feature groupings that they correspond to.

Figure 4

Figure 5

Conclusion

After this preliminary analysis, there are no easily identifiable connections between a film’s characteristics and its replacement cost. For future analysis it might be helpful to model non-linear relationships and interactions between variables as they relate to replacement cost. This could be done by fitting a tree-based model and interpreting the feature importances or shapley values.

Over time, how often are the different ratings and categories of films rented?

The rental data covers the time period between May 24th, 2005 to February 14th 2006. There is also an alarming 5-month gap in the data between August 2005 and February 2006. This can be seen in Figures 6 and 8. Such a gap, with 0 rentals, suggests a serious, long-term issue with the data pipeline.

Ignoring this error and the small amount of data (16044 rentals across nearly 10 months), we can still attempt some basic analysis. Looking at Figure 6, it appears that throughout the entire timescale of the data, PG-13 movies are rented most often and G movies are rented least often. Films with other ratings have consistently similar rental counts. This relationship lines up very nicely with the the inventory amounts of films with each rating shown in Figure 7.

The connection between rental counts and inventory continues for categories. This can be seen in Figures 8 and 9. It may be helpful to reference the note here for Figure 9, you can start with the most frequent category in the inventory (Sports) and continually add the next most frequent category.

Note: For Figures 6 and 9, it might be helpful to double click the lines in the legend to view one rating or category at a time. You can toggle on other ratings/categories by clicking once on their line. Be aware that isolating a single rating or category can change the scale of the y axis.

Figure 6

Figure 7

Figure 8

Figure 9

Conclusion

For this question, timescale and data quality are the main factors preventing any more complex analysis. The relationship between inventory amounts and rental counts is fairly consistent for this small time window, even as total rental count changes. If there was more data, it would be interesting to see if this relationship continues. It would also facilitate in identifying any trends or seasonalities, while also allowing for more robust forecasting of rentals.

What is the current total outstanding balance of all of our customers?